# Low-latency inference

Phi Mini MoE Instruct GGUF
MIT
Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model suitable for English business and research scenarios, excelling in resource-constrained environments and low-latency scenarios.
Large Language Model English
P
gabriellarson
2,458
1
Sarvam Finetune
This is a transformers model published on Hub. The specific functions and detailed information are to be supplemented.
Large Language Model Transformers
S
jk12p
112
1
Unlearn Tofu Llama 3.2 1B Instruct Forget10 SimNPO Lr1e 05 B4.5 A1 D0 G0.25 Ep5
This is a transformers model that has been uploaded to the Hugging Face Hub. Specific information is to be supplemented.
Large Language Model Transformers
U
open-unlearning
153
1
Qwen3 14b Ug40 Pretrained
This is an automatically generated transformers model card, lacking specific model information.
Large Language Model Transformers
Q
jq
1,757
1
Sn29 Q1m4 Dx9i
This is a transformers model published on the Hugging Face Hub, and specific information is to be supplemented.
Large Language Model Transformers
S
mci29
1,797
1
Mistral Small 3.1 24B Instruct 2503 Quantized.w8a8
Apache-2.0
This is an INT8-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized by Red Hat and Neural Magic, suitable for fast response and low-latency scenarios.
Safetensors Supports Multiple Languages
M
RedHatAI
833
2
Mistral Small 3.1 24B Instruct 2503 FP8 Dynamic
Apache-2.0
This is a 24B-parameter conditional generation model based on the Mistral3 architecture, optimized with FP8 dynamic quantization, suitable for multilingual text generation and visual understanding tasks.
Safetensors Supports Multiple Languages
M
RedHatAI
2,650
5
Mistral Small 3.1 24B Instruct 2503
Apache-2.0
Mistral Small 3.1 is a large multimodal language model with 24 billion parameters, possessing visual understanding ability and 128k long context processing ability, suitable for various tasks.
Image-to-Text Supports Multiple Languages
M
chutesai
2,035
0
Sana Sprint 1.6B 1024px
SANA-Sprint is an ultra-efficient text-to-image diffusion model that reduces inference steps from 20 to 1-4 while maintaining top-tier performance.
Image Generation Supports Multiple Languages
S
Efficient-Large-Model
475
12
Canary 1b Flash
NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.
Speech Recognition Supports Multiple Languages
C
nvidia
125.22k
186
Mistral Small 24B Instruct 2501 Quantized.w8a8
Apache-2.0
A Mistral instruction fine-tuned model with 24B parameters after INT8 quantization, significantly reducing GPU memory requirements and improving computational throughput.
Large Language Model Safetensors Supports Multiple Languages
M
RedHatAI
158
1
Phi 4 Multimodal Instruct
MIT
Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that integrates language, vision, and speech research and datasets from Phi-3.5 and 4.0 models. It supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.
Multimodal Fusion Transformers Supports Multiple Languages
P
Robeeeeeeeeeee
21
1
Quickmt Zh En
A fast and accurate neural machine translation model for Chinese to English translation
Machine Translation Supports Multiple Languages
Q
quickmt
23
1
Whisper Large V3 Distil Multi7 V0.2
MIT
A distilled multilingual Whisper model supporting automatic speech recognition for 7 European languages with code-switching capability
Speech Recognition Transformers Supports Multiple Languages
W
bofenghuang
119
1
Bart Large Mnli Openvino
MIT
This is the OpenVINO optimized version of the facebook/bart-large-mnli model for zero-shot text classification tasks.
Text Classification
B
Smashyalts
16
0
Vectorizer.guava
A vectorization tool developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity calculation and retrieval tasks.
Text Embedding Supports Multiple Languages
V
sinequa
204
1
Kotoba Whisper V2.0
Apache-2.0
Kotoba-Whisper is a Japanese automatic speech recognition distilled model developed by Asahi Ushio in collaboration with Kotoba Technologies, based on Whisper large-v3 distillation, achieving a 6.3x inference speed improvement.
Speech Recognition Transformers Japanese
K
kotoba-tech
8,108
60
Show O
MIT
Show-o is an any-to-any conversion model based on PyTorch, supporting input and output conversion across multiple modalities.
Text-to-Video
S
showlab
225
16
Zamba2 2.7B
Apache-2.0
Zamba2-2.7B is a hybrid model composed of state space and Transformer modules, using the Mamba2 module and shared attention module, featuring high performance and low latency.
Large Language Model Transformers
Z
Zyphra
2,550
77
Snowflake Arctic Embed M V1.5
Apache-2.0
Snowflake Arctic Embed M v1.5 is an efficient sentence embedding model, focusing on sentence similarity calculation and feature extraction tasks.
Text Embedding
S
Snowflake
219.46k
58
Mobileclip B LT OpenCLIP
MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.
Text-to-Image
M
apple
774
9
Mobileclip B OpenCLIP
MobileCLIP-B is an efficient image-text model that achieves fast inference through multimodal reinforcement training and excels in zero-shot image classification tasks.
Text-to-Image
M
apple
715
3
Mobileclip S2 OpenCLIP
MobileCLIP-S2 is an efficient text-image model that achieves fast zero-shot image classification through multimodal reinforcement training.
Text-to-Image
M
apple
99.74k
6
Mobileclip S0 Timm
MobileCLIP-S0 is an efficient image-text model achieved through multimodal reinforcement training, significantly improving speed and size efficiency while maintaining high performance.
Text-to-Image
M
apple
532
10
Llm Compiler 7b Ftd
Other
The LLM Compiler is a state-of-the-art LLM based on Code Llama, specifically designed for code optimization and compiler inference tasks. It far exceeds existing public models in understanding compiler optimization.
Large Language Model Transformers
L
facebook
106
26
Kotoba Whisper V1.1
Apache-2.0
Kotoba-Whisper-v1.1 is a Japanese automatic speech recognition model based on Whisper, with added punctuation and timestamp processing capabilities.
Speech Recognition Transformers Japanese
K
kotoba-tech
476
33
Meta Llama 3 8B Instruct Function Calling
Apache-2.0
This is a Llama 3 instruction model fine-tuned for function calling, suitable for commercial use under the Llama 3 Community License.
Large Language Model Transformers English
M
Trelis
499
44
Kotoba Whisper V1.0
Apache-2.0
Kotoba-Whisper is a Japanese automatic speech recognition distilled Whisper model collection jointly developed by Asahi Ushio and Kotoba Technologies, which is 6.3 times faster than the original large-v3 while maintaining similar low error rates.
Speech Recognition Transformers Japanese
K
kotoba-tech
2,397
53
Mamba 370m Hf
Mamba is an efficient language model based on the State Space Model (SSM), with the ability to model sequences with linear time complexity.
Large Language Model Transformers
M
state-spaces
6,895
14
Codellama 70B Python GPTQ
CodeLlama 70B Python is a large language model focused on the Python programming language, based on the Llama 2 architecture, optimized for code generation and completion tasks.
Large Language Model Transformers Other
C
TheBloke
89
19
Codellama 70B Instruct GGUF
CodeLlama 70B Instruct is a large-scale code generation model based on the Llama 2 architecture, specifically optimized for code understanding and generation tasks.
Large Language Model Other
C
TheBloke
2,703
57
Yi Ko 6b Text2sql
This is a transformers model published on the Hugging Face Hub. The specific functions and features are to be supplemented.
Large Language Model Transformers
Y
shangrilar
1,918
2
Mobilevlm 3B
Apache-2.0
MobileVLM is a fast and powerful multi-modal vision-language model designed specifically for mobile devices, supporting efficient cross-modal interaction.
Text-to-Image Transformers
M
mtgv
346
13
Faster Whisper Base.en
MIT
This is a Whisper base.en model converted based on CTranslate2, used for English speech recognition tasks.
Speech Recognition English
F
Systran
367.44k
4
Faster Whisper Medium.en
MIT
This is the CTranslate2 converted version of the OpenAI Whisper medium.en model, used for efficient automatic speech recognition tasks.
Speech Recognition English
F
Systran
65.17k
3
Faster Whisper Large V3
MIT
Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
F
Systran
713.48k
376
Vectorizer.vanilla
A vectorizer developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity computation and retrieval tasks.
Text Embedding Transformers English
V
sinequa
634
0
Vectorizer V1 S Multilingual
A multilingual vectorizer developed by Sinequa that generates embedding vectors for input paragraphs or queries, used for similarity calculation and information retrieval.
Text Embedding Transformers Supports Multiple Languages
V
sinequa
322
0
Vectorizer V1 S En
A vectorizer developed by Sinequa capable of generating embedding vectors from paragraphs or queries for sentence similarity computation and feature extraction.
Text Embedding Transformers English
V
sinequa
304
0
Stt Kr Conformer Ctc Medium
Apache-2.0
Korean automatic speech recognition model based on Conformer architecture, optimized for stream processing with excellent performance in specific domains like customer service voice
Speech Recognition Korean
S
SungBeom
176
9
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase